162 research outputs found

    Finding an unknown number of multivariate outliers

    Get PDF
    We use the forward search to provide robust Mahalanobis distances to detect the presence of outliers in a sample of multivariate normal data. Theoretical results on order statistics and on estimation in truncated samples provide the distribution of our test statistic. We also introduce several new robust distances with associated distributional results. Comparisons of our procedure with tests using other robust Mahalanobis distances show the good size and high power of our procedure. We also provide a unification of results on correction factors for estimation from truncated samples

    A Parametric Framework for the Comparison of Methods of Very Robust Regression

    Full text link
    There are several methods for obtaining very robust estimates of regression parameters that asymptotically resist 50% of outliers in the data. Differences in the behaviour of these algorithms depend on the distance between the regression data and the outliers. We introduce a parameter λ\lambda that defines a parametric path in the space of models and enables us to study, in a systematic way, the properties of estimators as the groups of data move from being far apart to close together. We examine, as a function of λ\lambda, the variance and squared bias of five estimators and we also consider their power when used in the detection of outliers. This systematic approach provides tools for gaining knowledge and better understanding of the properties of robust estimators.Comment: Published in at http://dx.doi.org/10.1214/13-STS437 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Volatility in the Italian Stock Market: An Empirical Study

    Get PDF
    We study the volatility of the MIB30–stock–index high–frequency data from November 28, 1994 through September 15, 1995. Our aim is to empirically characterize the volatility random walk in the framework of continuous–time finance. To this end, we compute the index volatility by means of the log–return standard deviation. We choose an hourly time window in order to investigate intraday properties of volatility. A periodic component is found for the hourly time window, in agreement with previous observations. Fluctuations are studied by means of detrended fluctuation analysis, and we detect long–range correlations. Volatility values are log–stable distributed. We discuss the implications of these results for stochastic volatility modelling.volatility; stochastic processes; random walk; statistical finance

    The Forward Search for Very Large Datasets

    Get PDF
    The identification of atypical observations and the immunization of data analysis against both outliers and failures of modeling are important aspects of modern statistics. The forward search is a graphics rich approach that leads to the formal detection of outliers and to the detection of model inadequacy combined with suggestions for model enhancement. The key idea is to monitor quantities of interest, such as parameter estimates and test statistics, as the model is fitted to data subsets of increasing size. In this paper we propose some computational improvements of the forward search algorithm and we provide a recursive implementation of the procedure which exploits the information of the previous step. The output is a set of efficient routines for fast updating of the model parameter estimates, which do not require any data sorting, and fast computation of likelihood contributions, which do not require matrix inversion or qr decomposition. It is shown that the new algorithms enable a reduction of the computation time by more than 80%. Furthemore, the running time now increases almost linearly with the sample size. All the routines described in this paper are included in the FSDA toolbox for MATLAB which is freely downloadable from the internet.JRC.G.2-Global security and crisis managemen

    Robust Bayesian regression with the forward search: theory and data analysis

    Get PDF
    The frequentist forward search yields a flexible and informative form of robust regression. The device of fictitious observations provides a natural way to include prior information in the search. However, this extension is not straightforward, requiring weighted regression. Bayesian versions of forward plots are used to exhibit the presence of multiple outliers in a data set from banking with 1903 observations and nine explanatory variables which shows, in this case, the clear advantages from including prior information in the forward search. Use of observation weights from frequentist robust regression is shown to provide a simple general method for robust Bayesian regression

    The box-cox transformation: review and extensions

    Get PDF
    The Box-Cox power transformation family for non-negative responses in linear models has a long and interesting history in both statistical practice and theory, which we summarize. The relationship between generalized linear models and log transformed data is illustrated. Extensions investigated include the transform both sides model and the Yeo-Johnson transformation for observations that can be positive or negative. The paper also describes an extended Yeo-Johnson transformation that allows positive and negative responses to have different power transformations. Analyses of data show this to be necessary. Robustness enters in the fan plot for which the forward search provides an ordering of the data. Plausible transformations are checked with an extended fan plot. These procedures are used to compare parametric power transformations with nonparametric transformations produced by smoothing

    The analysis of transformations for profit-and-loss data

    Get PDF
    We analyse data on the performance of investment funds, 99 out of 309 of which report a loss, and on the profitability of 1405 firms, 407 of which report losses. The problem in both cases is to use regression to predict performance from sets of explanatory variables. In one case, it is clear from scatter plots of the data that the negative responses have a lower variance than the positive responses and a different relationship with the explanatory variables. Because the data include negative responses, the Box–Cox transformation cannot be used. We develop a robust version of an extension to the Yeo–Johnson transformation which allows different transformations for positive and negative responses. Tests and graphical methods from our robust analysis enable the detection of outliers, the assessment of values of the two transformation parameters and the building of simple regression models. Performance comparisons are made with non-parametric transformations

    Building Regression Models with the Forward Search

    Get PDF
    We give an example of the use of the forward search in building a regression model. The standard backwards elimination of variables is supplemented by forward plots of added variable t statistics that exhibit the effect of each observation on the process of model building. Attention is also paid to the effect of individual observations on selection of a transformation. Variable selection using AIC is mentioned, as is the analysis of multivariate data

    Simulating mixtures of multivariate data with fixed cluster overlap in FSDA library

    Get PDF
    We extend the capabilities of MixSim, a framework which is useful for evaluating the performance of clustering algorithms, on the basis of measures of agreement between data partitioning and flexible generation methods for data, outliers and noise. The peculiarity of the method is that data are simulated from normal mixture distributions on the basis of pre-specified synthesis statistics on an overlap measure, defined as a sum of pairwise misclassification probabilities. We provide new tools which enable us to control additional overlapping statistics and departures from homogeneity and sphericity among groups, together with new outlier contamination schemes. The output of this extension is a more flexible framework for generation of data to better address modern robust clustering scenarios in presence of possible contamination. We also study the properties and the implications that this new way of simulating clustering data entails in terms of coverage of space, goodness of fit to theoretical distributions, and degree of convergence to nominal values. We demonstrate the new features using our MATLAB implementation that we have integrated in the Flexible Statistics for Data Analysis (FSDA) toolbox for MATLAB. With MixSim, FSDA now integrates in the same environment state of the art robust clustering algorithms and principled routines for their evaluation and calibration. A spin off of our work is a general complex routine, translated from C language to MATLAB, to compute the distribution function of a linear combinations of non central χ2\chi ^2?2 random variables which is at the core of MixSim and has its own interest for many test statistics
    • …
    corecore